Assignment 2 : Single View to 3D
...

Course - 16825 : Learning for 3D Vision
Name - Parth Nilesh Shah
AndrewId - pnshah

Date - 2/25/2024


1. Exploring Loss Functions
...

1.1 Fitting a voxel grid
...

Ground TruthOptimized
vox_fir_tgt.gifvox_fit_src.gif

1.2 Fitting a pointcloud
...

Ground TruthOptimized
point_fit_tgt.gifpoint_fit_src.gif

1.3 Fitting a mesh
...

Ground TruthOptimized
mesh_fit_tgt.gifmesh_fit_src.gif

2. Reconstructing 3D from Single View
...

2.1 Image to voxel grid
...

ImageGround TruthPrediction
vox_40_img.pngvox_40_gt.gifvox_40_pred.gif
vox_110_img.pngvox_110_gt.gifvox_110_pred.gif
vox_470_img.pngvox_470_gt.gifvox_470_pred.gif

2.2 Image to point cloud
...

ImageGround TruthPrediction
point_20_img.pngpoint_20_gt.gifpoint_20_pred.gif
point_650_img.pngpoint_650_gt.gifpoint_650_pred.gif
point_280_img.pngpoint_280_gt.gifpoint_280_pred.gif

2.3 Image to mesh
...

ImageGround TruthPrediction
mesh_0_img.pngmesh_0_gt.gifmesh_0_pred.gif
mesh_360_img.pngmesh_360_gt.gifmesh_360_pred.gif
mesh_660_img.pngmesh_660_gt.gifmesh_660_pred.gif

2.4 Quantitative Comparison
...

TypeAvg F1 Score @0.05
Voxel44.534
Pointcloud84.523
Mesh71.633

Note : Not using the updated code, hence I have the lower scores for voxel prediction

Evaluation Voxel
eval_vox.png
Evaluation Point Cloud
eval_point.png
Evaluate Mesh
eval_mesh.png

Intuitive Explanation

2.5 Analyse effects of hyperparams varaiations
...

Experiment - Changing n_points in Pointcloud

The decoder architecture for my Image2Pointcloud is as follows -
LinearLayer (512, n_points) -> Relu -> Linear(n_points, n_points * 3) -> Tanh

Since my inner layer was dependent on n_points, so I decided to play around with changing n_points and see how it affects the results.

ImageGT50020005000
F1score@0.05-71.35479.63584.621
point_0_img.pngpoint_0_gt.gifpoint_0_500.gifpoint_0_2000.gifpoint_0_5000.gif
point_400_img.pngpoint_400_gt.gifpoint_400_500.gifpoint_0_2000.gifpoint_400_5000.gif

We can see that as the number of points increases, the performance of the model increases as well.
I interpret it as follow, as the number of points increases, the number of connections in the hidden layer increasing making the network more expressive.

2.6 Interpret your model
...


3. Exploring other architectures / datasets
...

3.3 Extended dataset for training
...

Performed an expertiment where I trained a PointCloud Prediction network on the full dataset.

Qualitative Results -

ImageGround TruthPrediction
point_200_img.pngpoint_200_gt.gifpoint_200_pred.gif
point_500_img.pngpoint_500_gt.gifpoint_500_pred.gif
point_800_img.pngpoint_800_gt.gifpoint_800_pred.gif

Quantitative Results -
avg F1 @ 0.05 = 91.45%

Evaluation on 3 classes
eval_point_full.png

A couple of things I noticed, I had to train for nearly around 20000 iterations for the full the loss to converge which was nearly thrice the number of iterations it took when training on one class.
Qualitatively the predictor seems to do better on the Planes and Cars class rather than the chairs.